Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve logic of unexpected processes check started by agent #2522

Merged
merged 6 commits into from
Mar 3, 2022

Conversation

nagworld9
Copy link
Contributor

@nagworld9 nagworld9 commented Mar 1, 2022

Description

The current logic of finding parent and if parent is agent, then we don't consider it as unexpected process and don't disable cgroups. That is broken if the process gets orphaned on certain scenarios after it's started by agent.

Design:
we should explicitly add an env variable to the processes we create and change the cgroupconfigurator to look for that var.


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

@codecov
Copy link

codecov bot commented Mar 1, 2022

Codecov Report

Merging #2522 (61ec08c) into develop (96a6399) will decrease coverage by 0.01%.
The diff coverage is 86.95%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2522      +/-   ##
===========================================
- Coverage    71.93%   71.92%   -0.02%     
===========================================
  Files          102      102              
  Lines        15166    15188      +22     
  Branches      2400     2403       +3     
===========================================
+ Hits         10910    10924      +14     
- Misses        3775     3778       +3     
- Partials       481      486       +5     
Impacted Files Coverage Δ
azurelinuxagent/common/utils/shellutil.py 79.88% <77.77%> (-0.12%) ⬇️
azurelinuxagent/common/cgroupconfigurator.py 73.25% <92.85%> (+0.49%) ⬆️
azurelinuxagent/ga/collect_telemetry_events.py 89.72% <0.00%> (-1.72%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96a6399...61ec08c. Read the comment docs.

tests/ga/test_multi_config_extension.py Show resolved Hide resolved
@@ -605,7 +605,8 @@ def _check_processes_in_agent_cgroup(self):
current = process
while current != 0 and current not in agent_commands:
current = self._get_parent(current)
if current == 0:
# Process started by agent will have a marker and check if that marker found in process environment.
if current == 0 and not self.__is_process_env_flag_found(process):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as alternative condition If all the above checks unsuccessful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably due the env variable check at the beginning and skip the rest of the logic if that check succeeds.

The current logic is too complicated and it would be good if we can drop it. You can add some sample telemetry to see if there are any cases where the env check is False, but the current logic is True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we have special cases like demon, systemd-run checks and we need those even now. I think only place we can improve when we check parent process. We can get rid of this and add env variable check there.

                    current = process
                    while current != 0 and current not in agent_commands:
                        current = self._get_parent(current)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further look I realized we need this parent check since few places like logcollector we do popen directly not going through defined code. In those scenarios, this parent pid check would catch it. So, I would say either we need all checks or update direct popen calls to shellutil. Let me know your thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is not quite what i meant, but we can sync offline... for now you can leave it as is


# Set the marker before process start
env[PARENT_PROCESS_NAME] = AZURE_GUEST_AGENT
kwargs['env'] = env
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If caller sends env arg, use the same flags otherwise extend os.enviorn along with new marker flag

@@ -605,7 +605,8 @@ def _check_processes_in_agent_cgroup(self):
current = process
while current != 0 and current not in agent_commands:
current = self._get_parent(current)
if current == 0:
# Process started by agent will have a marker and check if that marker found in process environment.
if current == 0 and not self.__is_process_env_flag_found(process):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably due the env variable check at the beginning and skip the rest of the logic if that check succeeds.

The current logic is too complicated and it would be good if we can drop it. You can add some sample telemetry to see if there are any cases where the env check is False, but the current logic is True.

@@ -640,6 +641,24 @@ def __format_process(pid):
pass
return "[PID: {0}] UNKNOWN".format(pid)

@staticmethod
def __is_process_env_flag_found(pid):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some comments about how that variable is used, and probably rename the function to something that indicates its purpose (check that the process is a descendant of the agent) rather than its implementation (check for the variable).

azurelinuxagent/common/cgroupconfigurator.py Outdated Show resolved Hide resolved
environ = env_file.read()
if environ and environ[-1] == '\x00':
environ = environ[:-1]
match = re.search(shellutil.PARENT_PROCESS_NAME + "=" + shellutil.AZURE_GUEST_AGENT, environ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we do not need a regex for this check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is how sample output looks. I feel regex would simplify the parsing.

[root@edpm26zyjw azureuser]# cat /proc/1156/environ 
OLDPWD=/etc/sysconfig/network-scripts_AZURE_GUEST_AGENT_DAEMON_VERSION_=9.9.9.9PATH=/sbin:/usr/sbin:/bin:/usr/binPWD=/etc/sysconfig/network-scriptsLANG=en_US.UTF-8PARENT_PROCESS_NAME=AZURE_GUEST_AGENTSHLVL=2_=/sbin/dhclient

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just use "{0}={1}".format(shellutil.PARENT_PROCESS_NAME, shellutil.AZURE_GUEST_AGENT) in environ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinclark19a yes, that is what i was trying to point out, sorry for not being clear... I did not see any pattern in the re.search call, so this is basically a "contains".

Now @nagworld9 if you want to be more strict than simple "contains", you should match whatever separators the file uses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see your point and fixed it, thanks.

tests/ga/test_multi_config_extension.py Show resolved Hide resolved
tests/ga/test_multi_config_extension.py Show resolved Hide resolved
narrieta
narrieta previously approved these changes Mar 2, 2022
Copy link
Member

@narrieta narrieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor comment about the use of re.search, otherwise LGTM (please look at the comment)

@nagworld9 nagworld9 merged commit c42343c into Azure:develop Mar 3, 2022
@nagworld9 nagworld9 deleted the TASK-12908142 branch March 3, 2022 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants